Adaptive and reusable processing of retroactive sequences for automated predictions

ABSTRACT

Systems and methods for predicting the outcome of a business entity are presented. In embodiments, a system may receive explicit data reporting or indicating activities of a business entity, and other data from which information regarding the activities or level of operations of the entity may be inferred. Using one or more data processors, the system may generate inferred data regarding the business entity from a selected portion of the other data, and use at least some of the explicit data and the inferred data to determine which one of a series of defined sequential active states of development the entity currently is in. The system may further, using the result of the determination as the current state of the business, predict a final stage of the business entity, and a probability of evolving to that final stage from the current state. Other embodiments may be disclosed or claimed.

PRIORITY

This application is a continuation of U.S. patent application Ser. No.15/458,931, filed on Mar. 14, 2017, which is a non-provisional of andclaims priority benefit to each of U.S. Provisional Patent Application62/307,918, filed on Mar. 14, 2016, and U.S. Provisional PatentApplication No. 62/308,095, filed on Mar. 14, 2016, all of which areincorporated herein by reference in their entirety.

TECHNICAL FIELD

The present disclosure relates to the field of database creation anduse, in particular, to systems for inferring an active state of acompany from a variety of data sources, predicting a final state thatthe company will likely evolve to from the current active state, and theprobability of evolving to such a final state, and associated methods orcomputer readable storage media.

BACKGROUND

Early-stage investors in companies are often characterized as followingone of a “gut-driven”, “term-driven” or “lemming-like” approach.Gut-driven investors primarily follow their instincts about specificcompanies in making decisions. Term-driven focus on maximizing potentialreturns by focusing on companies that offer better financial terms thanothers. Lemming-like investors tend to let others identify promisingopportunities and follow them, frequently co-investing in companies thatothers feel are promising. Neither of these approaches involve adetailed, quantified basis for decision making.

It would be advantageous to have a technical tool for evaluatinginvestment opportunities to identify those which are likely to be moresuccessful than others. Such a tool would allow an investor to evaluatestartups early in their evolution and to predict the success they arelikely to achieve.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments will be readily understood by the following detaileddescription in conjunction with the accompanying drawings. To facilitatethis description, like reference numerals designate like structuralelements. Embodiments are illustrated by way of example, and not by wayof limitation, in the figures of the accompanying drawings.

FIG. 1 illustrates an example Complex Hidden Markov Model of five hiddenstates of a company's life cycle, where each active state subsumesmultiple events, in accordance with various embodiments.

FIG. 2 illustrates the five hidden active states of FIG. 1 with fourfinal states, and one actual active state, in accordance with variousembodiments.

FIG. 3 illustrates a high level view of the operational flow of aprocess for collecting data, constructing a model, and generating ascore for a company under analysis, in accordance with variousembodiments.

FIG. 4 illustrates the gathering and processing of information regardingcompanies or entities to generate a CSV database, in accordance withvarious embodiments.

FIG. 5 illustrates a screen from an exemplary software environment inaccordance with various embodiments showing a set of jobs, contexts,code and templates that may be used to import, transform and generate aCSV database as shown in FIG. 4, ready to model;

FIG. 6 illustrates a screen from an exemplary software environmentillustrating the importation of basic files into a local database, inaccordance with various embodiments.

FIG. 7 illustrates a screen from an exemplary software environmentillustrating the extension of companies with additional data, inaccordance with various embodiments.

FIGS. 8-9 illustrate a screen from an exemplary software environmentillustrating the aggregation of data by year, country, state, region,and city, and by year, country, state, region, city and status, inaccordance with various embodiments.

FIG. 10 illustrates a screen from an exemplary software environmentillustrating the creation of a final dataset structure, in accordancewith various embodiments.

FIG. 11 illustrates a screen from an exemplary software environmentillustrating the generation of basic features and founded previousyears' metrics, in accordance with various embodiments.

FIG. 12 illustrates a screen from an exemplary software environmentillustrating the generation of founded previous years' metrics bystatus, in accordance with various embodiments.

FIG. 13 illustrates a screen from an exemplary software environmentillustrating the generation of founded previous years' metrics bycity-status, in accordance with various embodiments.

FIG. 14 illustrates a screen from an exemplary software environmentillustrating the generation of founded next years' metrics, inaccordance with various embodiments.

FIG. 15 illustrates a screen from an exemplary software environmentillustrating the generation of on_exit previous years' metrics, inaccordance with various embodiments.

FIG. 16 illustrates a screen from an exemplary software environmentillustrating the generation of success and patent metrics, in accordancewith various embodiments.

FIG. 17 illustrates a screen from an exemplary software environmentillustrating the generation of team, roles and Twitter™ metrics, inaccordance with various embodiments.

FIG. 18 illustrates a screen from an exemplary software environmentillustrating the generation of an output dataset, in accordance withvarious embodiments.

FIG. 19 illustrates a screen from an exemplary software environmentillustrating the generation of a final score with a confidence intervalfor a given company, in accordance with various embodiments.

FIG. 20 illustrates a screen from an exemplary software environmentillustrating the calculation of a company's reach value, in accordancewith various embodiments.

FIG. 21 illustrates a screen from an exemplary software environmentillustrating in detail various components of a final score, and asampling of a set of scores generated using these components, inaccordance with various embodiments.

FIG. 22 illustrates a system for combining individual character profilesof individual team members into a team score for a company, inaccordance with various embodiments.

FIG. 22B illustrates details of a feature vector of the system of FIG.22, in accordance with various embodiments.

FIG. 23 illustrates an example set of attributes of an individual teammember, in accordance with various embodiments.

FIG. 24 illustrates combining individual team member profiles usingformal mathematical rules, into an aggregate team profile, in accordancewith various embodiments.

FIG. 25 illustrates analysis of a team profile relative to a preferreddistribution, in accordance with various embodiments.

FIG. 26 illustrates the operational flow of a process to generate a teamscore, in accordance with various embodiments.

FIG. 27 illustrates a block diagram of a computer device suitable forpracticing the present disclosure, in accordance with variousembodiments.

FIG. 28 illustrates an example computer-readable storage medium havinginstructions configured to practice aspects of the processes of thepresent disclosure, in accordance with various embodiments.

SUMMARY

Systems and methods for predicting the outcome of a business entity arepresented. In embodiments, a system may receive explicit data reportingor indicating activities of a business entity, and other data from whichinformation regarding the activities or level of operations of theentity may be inferred. Using one or more data processors, the systemmay generate inferred data regarding the business entity from a selectedportion of the other data, and use at least some of the explicit dataand the inferred data to determine which one of a series of definedsequential active states of development the entity currently is in. Thesystem may further, using the result of the determination as the currentstate of the business, predict a final stage of the business entity, anda probability of evolving to that final stage from the current state.Other embodiments may be disclosed or claimed.

DETAILED DESCRIPTION Introduction

Examples of systems, apparatus, computer-readable storage media, andmethods according to the disclosed implementations are described herein.These examples are being provided solely to add context and aid in theunderstanding of the disclosed implementations. It will thus be apparentto one skilled in the art that the disclosed implementations may bepracticed without some or all of the specific details provided. In otherinstances, certain process or method operations, also referred to hereinas “blocks,” have not been described in detail in order to avoidunnecessarily obscuring the disclosed implementations. Otherimplementations and applications also are possible, and as such, thefollowing examples should not be taken as definitive or limiting eitherin scope or setting.

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

Aspects of the present invention are described below with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine, such that the instructions, which are implemented viathe processor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which are implemented onthe computer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

In the following detailed description, references are made to theaccompanying drawings, which form a part of the description and in whichare shown, by way of illustration, specific implementations. Althoughthese disclosed implementations are described in sufficient detail toenable one skilled in the art to practice the implementations, it is tobe understood that these examples are not limiting, such that otherimplementations may be used and changes may be made to the disclosedimplementations without departing from their spirit and scope. Forexample, the blocks of the methods shown and described herein are notnecessarily performed in the order indicated in some otherimplementations. Additionally, in some other implementations, thedisclosed methods may include more or fewer blocks than are described.As another example, some blocks described herein as separate blocks maybe combined in some other implementations. Conversely, what may bedescribed herein as a single block may be implemented in multiple blocksin some other implementations. Additionally, the conjunction “or” isintended herein in the inclusive sense where appropriate unlessotherwise indicated; that is, the phrase “A, B or C” is intended toinclude the possibilities of “A,” “B,” “C,” “A and B,” “B and C,” “A andC” and “A, B and C.”

In embodiments, to solve certain predictive tasks information needs tofirst be represented and stored using sequences of events that can laterbe used to build predictive models. When the information regarding pastevents is simple, certain, obtained all at once or at regular intervals,and from homogenous sources, then sequences of events can be segmenteddeterministically using basic rules, and traditional machine learningmethods can be applied to build models capable of predicting futureevents in the sequence using preceding events.

However, when the events are complex and the information that theyrepresent is dynamic, incomplete, uncertain and produced by multipleheterogeneous sources of information traditional machine learningmethods require repeating the model building process every time any ofthe following occur: the complex structure of an event changes, it iscompleted with more data, certain data is removed, or the level ofuncertainty varies.

It is noted that various measures about private companies can becollected from multiple, distributed multiple sources of information.These measures usually only help represent a partial view of the companyas most often companies only make public certain information they areinterested in publicizing. Measures can also be inconsistent, and maychange depending on the source, as well as the time at which the sourceis queried. In this domain, uncertainty stems not only from lack ofpublic information but also because companies may intentionally hidecertain information. It is in fact an adversarial domain.

In embodiments, systems and methods that are capable of efficientlybuilding, adapting and reusing the story of private companies to predicttheir future stages using unsegmented sequences of complex events comingfrom multiple, distributed, and often inconsistent, sources ofinformation may be provided. Such overall systems allow a collection ofmeasures for a given company under analysis to be compared to analogousmeasures for both successful and unsuccessful companies.

In embodiments, methods involve representing, segmenting, andclassifying the evolutionary time course of companies from founding to asuccessful exit event or dissolution, and evaluating where a potentialinvestment lies in its evolution. In embodiments, the methods furtherinvolve comparing the initial evolution of a company that is beingevaluated as a potential investment to both successful and unsuccessfulcompanies, and their respective life cycles, so as to assess thelikelihood that the company being evaluated will succeed.

It is noted that one important factor in the success of a startupcompany is the team of people working for the company. Accordingly, alsodisclosed are systems and methods that include characterizing individualteam members and combining the individual characterizations into anaggregate characterization of the team. Such systems and methods may usemachine learning technologies to evaluate multiple types of dataregarding both individuals and teams to predict the likelihood that acompany will be successful and therefore be a good investment. Inembodiments, these individual and team characterizations can be combinedwith other measures of company performance relevant to predictingwhether the company will succeed or fail.

Although the disclosed system and methods are described in the contextof predicting the success or failure of private companies, they mayfurther be applied to many other similar problems such as, for example,reconstructing and predicting the professional career of individuals,entities and organizations, and even governments.

Methods of Data Collections and Analysis

In embodiments, methods according to the present disclosure iterativelyreconstruct the sequential stories (i.e., their developmental history)of private companies using multiple, heterogeneous, and distributedsources of information that may later be used to infer the currenthidden stage of a new company and compute the probability of reachingfuture stages.

In embodiments, a company story may be represented using a ComplexHidden Markov Model (CHMM) having a set of sequential states ofdevelopment. In the present disclosure these developmental states may bereferred to as “hidden states”, inasmuch as companies do not report whatstate they are in, or even use such a taxonomy to chart their owndevelopment. In such CHHMs, each hidden state has a 1-to-1correspondence to each of the sequential stages that a company's lifecan go through. These stages may, for example, follow Bell's stages of:concept, seed, product development, market development and steady state(Bell, C. Gordon, High-tech Ventures: The Guide For EntrepeneurialSuccess, 1991), or, for example, they may follow a more institutionalinvestment structure, such as Series-A, Series-B, Series-C, or otherstructures that may be useful in a given industry or context.

Each hidden state has associated with it a minimum and maximum duration,expected values for a number of sub-scores (defined below) for the stagethat the hidden state represents, and a vector that defines theprobability of transitioning to one of the final states defined below.

In embodiments, there may be two types of observable states. Finalstates and active states, next described.

Final states. When one of these states is reached it means that thecompany is in its last stage. Each final state can be associated with asuccess or failure depending on the associated economic output. Inembodiments, final states may be recognized by observing or inferringpublic events using one or of multiple of information. Final states maycomprise, for example, company closed, acquired, IPO (initial publicstock offering), or “acqui-hired”—referring to an acquisition whoseprimary purpose was the hiring of high-value employees.

Active states. These states are not directly observed but may beinferred using the mechanism described below that approximates theactivity expected at the current state depending on previously inferredactivity states of similar companies. In embodiments, a mechanism tomonitor the trend and evolution of a number of metrics and match themagainst previously expected predictions may be used to track and inferthe state of a company. One such metric, for example, is called“Reach”—a composition of the number of followers on social networks (forexample, Twitter, etc.).

A CHMM hidden state abstracts the life cycle of a company in ahierarchical way. Thus, each state represents a potentially large numberof events that are all represented by an active state. A CHMM activestate may subsume a number of events. This is illustrated in FIG. 1,which shows five states 210 in the life of a company, each of which maybe subsume many actual events 220 (e.g., e1, e2, e3, . . . e_(n)) in theactivities of the company.

In embodiments, each company in the system has a CHMM associated withit. While the same CHMM may be associated with more than one company, aCHHM is instantiated separately for each company. In embodiments, a CHMMinstantiation can monitor and predict the stages of a company.

FIG. 2 illustrates an example CHMM with five hidden states, four finalstates, and one active state. It is noted that in general the number ofhidden active states is variable, and may depend upon the industry, howthe CHMMs are set up, and the available data.

In embodiments, the sources of data for a given company may includeself-reported data, implicitly-generated data and explicitly generateddata. Self-reported data (SRD) is financial or other type of informationthat can directly unveil the stage of a company and is reported directlyby the company or as part of an official requirement.Explicitly-generated data (EGD) is data created by the company topromote itself through social networks, blogs, marketing channels, etc.Finally, implicitly-generated data (IGD) is data that can be inferredfrom self-reported data, explicitly generated data, or data generated byother sources, independent of the company. For example, the number ofemployees of a company can be approximated using the number of employeesthat individually report where they work on social networks.

In embodiments, each data source may have an associated a general levelof confidence and individual level of confidence for each of thedimensions that can be extracted from the particular source.

In embodiments, sources can duplicate information or providecontradictory information, either simultaneously or over time. Eachsource may have an associated level of confidence as a whole and mayalso have a level of confidence for each attribute that is used tocompute the various metrics described below. The level of confidence foreach data source, and for each attribute provided by the data source,can change over time. When this happens the stories that are affected bythe confidence level of those sources need to be adapted accordingly.

In embodiments, stories for a company may be reconstructed in multiplepasses separated in time and using different sources of information. Forcompanies that have reached a final state, their story can be completelyreconstructed and can be used to construct a model that better computesthe sub-scores that help match the active states.

Example System and Methods

FIG. 3 illustrates a high-level view of a system that may be used toimplement various methods of the present disclosure. The illustratedsystem may comprise data sources 1 that are processed into slowlychanging dimensions 3 and fast changing dimensions 5. Data comprisingboth of these types of dimensions may be automatically clustered,

In embodiments, a system can process a number of data sources 1 that mayprovide data for individual or groups of entities in batches, or uponrequest, on a regular basis. Data sources 1 may include self-reporteddata, explicitly-generated data, and implicitly-generated data, asdescribed above.

In embodiments, data may be processed to generate a number of companystories using a CHMM for each group of similar companies, as shown at 1in FIG. 3. To decide the of number of company stories to build, data mayfirst be sequenced in time for all available companies and automaticallyclustered using a combination of SCDs 3 and FCDs 5, as shown at 2. Inembodiments, both the number of clusters used, and the number ofcompanies assigned to each cluster, may change over time when morecompany stories are reconstructed or previous company stories changed.

In embodiments, data can be textual, networked, and tabular, and may beobtained in CSV, JSON, XML, or any other type of machine learningreadable data. In embodiments, data can be input via web forms,documents, or requested via a remote API.

As noted, the data corresponding to or associated with each company maybe extracted and processed to generate two types of dimensions: SlowlyChanging Dimensions (SCDs) 3, which are Company attributes that onlychange from time to time (e.g., name of a company, address, etc.), andFast Changing Dimensions (FCDs) 5, which are company attributes that maychange frequently, or even constantly (e.g., number of employees, numberof followers in social networks, etc.)

In embodiments, each dimension 3,5 may be represented by sequences ofcomplex events. Each event, in addition to the information thatdescribes the company's attribute, may also have associated with it alevel of confidence. The level of confidence may generally depending onboth the data source that generates the event and the particular levelof confidence of that data source for the specific attribute.

A sequence of events that represents the sequential story of the companyis generated using all the information collected from the company oneach iteration.

Sequences of events can be segmented in non-deterministic ways,time-warped according to different criteria, compared with othersequences using dynamic distances that can later be used toautomatically adapt model building for predictive tasks in a veryefficient way.

In embodiments, for each entity, the closest story is found using theentity's SCDs. A distance is computed using a combination of theentity's SCDs and the centroid of the entities that were clusteredtogether to compute each story.

In embodiments, each FCD may be a sub-score 4 that in turn may becomputed using a number of metrics that change over time. Each time anew entity is monitored by the system a mechanism may be activated toretrieve or estimate the data points that comprise each metric.Sub-scores 4 may be estimated using non-linear combinations of thecorresponding metrics.

In embodiments, the history of levels of activity across the variousmetrics may be compared against expected activity. Using the level ofactivity and the corresponding CHMM a prediction for the current statemay be generated. This is represented by the “active state” at thebottom of FIG. 2. It is noted that in FIG. 3, the SCDs 3 and FCDs 5 areprocessed together with a CHHM to generate the active state, shown in aversion of FIG. 2 for the company under analysis. It is from thisinstantiated CHHM for the company that a future state prediction 6, anda score 7 may be generated.

In embodiments, given the prediction for a hidden (active) state, theprobability distribution for each final state, and the exit statisticsfor companies that were used to build the CHMM (the story), a score 7may be computed. Preferably, each story may be built using the historicdata of a number of companies (using a cluster of slowly-changingfeatures (SCDs)), the data may include the exit amount (acquisitionprice, IPO amount, etc.). The score may be a number between 0 and 1. Thecloser to 1, the higher the chances that the company will becomesuccessful.

In embodiments, for each final state, an exit value (or anapproximation) may be known. The exit value may be based on the averageof exit values where only exits above a certain percentile and below acertain percentile are considered.

As the probability for each state is known an expected value may becomputed. Finally, in embodiments, a maximum exit value may benormalized, corresponding to the 90th percentile.

Granular Description of Embodiment Including Software Environment

In embodiments, the following processes may be implemented. It isassumed that the embodiments include a software environment havingregistered users, who may use the environment to research companies andobtain predictions, using the system and methods according to thepresent disclosure, as described above. Such an embodiment may bereferred to as “the service” in the following description.

-   -   1. Information Gathering    -   2. Consolidation    -   3. Data Transformation, Feature Engineering, Training models and        making Predictions    -   4. Presentation of the Information—Website    -   5. Presentation of the Information—Analyst Platform.        These processes are next described in detail.        1—Information gathering

In embodiments, two mechanisms may be used to obtain company data: oneautomatic and the other manual. The automatic mechanism may obtaininformation from either professional sources, such as Crunchbase, Owler,etc., or from online services such as Twitter, Facebook, LinkedIn, etc.

Manually sourced information allows registered users to give informationthat they know about the companies (most of the time it is privilegedinformation). The manual information may be used to train the models andto make “custom” predictions for users that have shared the information.If someone is a user that knows that a company has received aninvestment that has not been published, he or she may use thisinformation to get better predictions from the service, predictions thatare different that those generated for all the users.

Additionally, in embodiments, information about investors, areas, andglobal economic indicators may be obtained.

2—Data consolidation

Following obtaining information from all of the various differentsources, a consolidation process may be performed. Some sources havemore “confidence” than others and rules may be used to decide what thefinal information set for the company may comprise. For example, theHeadcount of the company can come from either Crunchbase, Owler,LinkedIn, or other sources. In embodiments, the system may use the onefrom Crunchbase as the final value for Headcount.

In embodiments, the following information may be harvested about thecompanies. In embodiments, all of this information may be used to trainmodels and make the predictions.

Company general information. This may include Foundation date, LegalName, Commercial Name, Description, Elevator pitch, Website,Address-Location (Country, State, Region, City), Headcount, Market.Public/Private, Business areas, Current status: if the company isrunning, or has been acquired, if an IPO was made.

In embodiments, the system may crawl the company website to get moreinformation about these categories. Most of the time it only gathers thehome page, but in embodiments, it also captures the products/servicespages.

-   -   Company team members, information about their CVs (professional        history, education), current position, start/end dates.    -   Company Board.    -   Founders.    -   Funding: information about the rounds: amount, investors that        have participated, % of each investor, date of the round.    -   Patents: which patens the companies have, covering what        technologies, when they were issued, patent abstracts.    -   Social Networks Activity (Twitter, Facebook). Tweets, Followers,        Following, Description, Likes performed, Last tweet date.

In embodiments, information may be continuously harvested, and a companyfootprint may be based on the results of this harvesting, creating itstimeline view.

3—Data Transformation, Feature Engineering, Training Models and MakingPredictions

In embodiments, a system may have more than 100 ETLs (Extract, Transformand Load processes) to maintain the company history, metrics, trainmodels, make predictions and publish the data to a system website.

To train the models, and thereby get predictions about the companies forwhom data is being gathered, the system may use BigML and other ML/NPLalgorithms.

In embodiments, the following predictions may be performed by a system:

-   -   Company transitions. The likelihood of a company being running,        IPO, acquired, acqui-hired, zombie or closed.    -   Company score. The likelihood of success of the company.    -   Company stage. Identification of the different stages the        company has passed through and the current stage. (Concept,        Seed, Product Development, Market Development, Steady).    -   Company top area. From all the business areas of the company        (obtained through the professional sources), identify the main        one.    -   Days to exit. The expected exit date.    -   Ranking of the company in various areas: in the World, in their        Country, in their Business Area.    -   Top competitors. The list of top competitors for the company.    -   Top similar. The list of top similar companies. Companies that        have something in common with the company. No matter if the        companies compete in the same market or not. This can help the        companies to find synergies.    -   Company business areas (may use the areas that come from the        professional sources such as Crunchbase, as well as other        sources).    -   Days until the next funding round.    -   Amount of the next funding round.    -   Company valuation.

4—Presentation of the Information—Website.

In embodiments, various and different visualizations may be used topresent the data generated in the previous stages.

Company Profile: Search for the profile of any of the more than 400 Kcompanies that are available in PreSeries. All of the descriptive andpredictive information that the system generates about them may beshown.

Investor Profile: All of the descriptive and predictive information thatthe system generates about the investors.

Rankings of investors: in embodiments, a user may obtain listings ofinvestors using different criteria. Ordering by number of companiesinvested, amount invested, focusing only in one location, selectinginvestors that have invested in specific areas, or that invest inspecific stages of the company, etc.

Comparison tool for Investors. In embodiments, the system may give usersthe ability to compare investors using a scatterplot and differentmetrics.

Rankings of Areas. Same as rankings of investors, but applied to areas.

Comparison tool for Areas. In embodiments, the system may give users theability to compare areas using a scatterplot and different metrics.

5—Presentation of the Information—Analyst Platform.

In embodiments, an advanced type of subscription to the system may beoffered: the ANALYST subscription. This advanced access give an“Analyst” subscriber access to all the datasets and models used to makethe predictions.

In embodiments, these advanced users may play with the data generated bythe system (datasets) and create their own models.

In embodiments, this functionality may be provided by using the BigMLPlatform. The inventors have, in exemplary embodiments, integrated BigMLinto PreSeries to give a better user experience (appearance, automaticcreation of the accounts, auto-login in BigML from PreSeries using alink, etc.).

In alternate (advanced) embodiments, an analyst subscriber may plugtheir own Models into the PreSeries system. They may thus customizePreSeries as they may desire, which can be of great value for them, andgive to them a significant competitive advantage.

Given both the description of FIGS. 2-3, and the more granulardescription of the example embodiment presented above, next describedare FIGS. 4-21, which include various screen shots from a softwareenvironment related to an exemplary PreSeries embodiment implementing asystem and methods according to the present disclosure.

FIG. 4 illustrates the gathering and processing of information regardingcompanies or entities to generate a CSV database, in accordance withvarious embodiments. Referring to FIG. 4, data may be obtained fromvarious support sources, including Crunchbase entities 410, or othersupport sources 415. The data may then be imported 420, processed 430and features extracted 440, to populate a CSV database 450, as shown. Itis such a database that may be used to generate

FIG. 5 illustrates a screen from an exemplary software environment inaccordance with various embodiments showing a set of jobs, contexts,code and templates that may be used to import, transform and generate aCSV database as shown in FIG. 4, ready to model;

FIG. 6 illustrates a screen from an exemplary software environmentillustrating the importation of basic files into a local database, inaccordance with various embodiments.

FIG. 7 illustrates a screen from an exemplary software environmentillustrating the extension of companies with additional data, inaccordance with various embodiments.

FIGS. 8-9 illustrate a screen from an exemplary software environmentillustrating the aggregation of data by year, country, state, region,and city, and additionally by year, country, state, region, city andstatus, in accordance with various embodiments.

FIG. 10 illustrates a screen from an exemplary software environmentillustrating the creation of a final dataset structure, in accordancewith various embodiments.

FIG. 11 illustrates a screen from an exemplary software environmentillustrating the generation of basic features and founded previousyears' metrics, in accordance with various embodiments.

FIG. 12 illustrates a screen from an exemplary software environmentillustrating the generation of founded previous years' metrics bystatus, in accordance with various embodiments.

FIG. 13 illustrates a screen from an exemplary software environmentillustrating the generation of founded previous years' metrics bycity-status, in accordance with various embodiments.

FIG. 14 illustrates a screen from an exemplary software environmentillustrating the generation of founded next years' metrics, inaccordance with various embodiments.

FIG. 15 illustrates a screen from an exemplary software environmentillustrating the generation of on_exit previous years' metrics, inaccordance with various embodiments.

FIG. 16 illustrates a screen from an exemplary software environmentillustrating the generation of success and patent metrics, in accordancewith various embodiments.

FIG. 17 illustrates a screen from an exemplary software environmentillustrating the generation of team, roles and Twitter™ metrics, inaccordance with various embodiments.

FIG. 18 illustrates a screen from an exemplary software environmentillustrating the generation of an output dataset, in accordance withvarious embodiments.

FIG. 19 illustrates a screen from an exemplary software environmentillustrating the generation of a final score 1901, for a given company,in accordance with various embodiments. With reference to FIG. 19, thecompany's final score 1901 is generated from six subparts, relating todifferent performance areas of the company, including individual scoresfor each of team 1910, market 1920, product 1930, funding 1940, activity1960 and reach 1950. These individual scores may be combined usingvarious weightings to generate the overall score 1901 and its overallconfidence interval. In the illustrated case, each of the subpart scoresis close to the final score of 51. The final score is expressed as adecimal fraction of 1.0.

FIG. 20 illustrates a screen from an exemplary software environmentillustrating the calculation of a company's reach value, in accordancewith various embodiments. Referring to FIG. 20, the reach 2001 of acompany may comprise its followers or subscribers on various on-lineoutlets, such as, for example, Twitter followers 2010, blog followers2020, Facebook followers 2030 and Google followers 2040. In the case ofreach, the followers or subscribers from all on-line outlets orinteractive media are added to obtain the overall reach 2001 of thecompany.

FIG. 21 illustrates a screen from an exemplary software environmentillustrating in detail various components of a final score 2110.Referring to FIG. 21, the calculation of a score 2110 here is analternate process to that illustrated in FIG. 19. Here the final scoreis a function of nine categories of company performance, organized infour main groupings: People 2120, Marketing 2140, Product 2150, andFinance 2180. Each main grouping has, in this example, threesub-categories, as follows: People 2120 includes CEO, Board ofDirectors, and Team (examples of how to calculate the value of a team toa given company are described more fully below in connection with FIGS.22 through 26). Marketing 2140 includes Sales, Marketing, and BusinessPlan. Product 2150 includes Engineering, Technology and Development, andfinally Finance includes Financeability, Control and Cash.

In embodiments, by calculating various metrics regarding a company alongthese twelve sub-categories, and combining the metrics in various waysas illustrated in the example screens shown in FIGS. 4 through 20, anoverall score 2110 may be generated. FIG. 21 provides a set of examplescores generated using the scoring approach illustrated in the upperpanel.

Team Scores And Entrepeneurial Character

As noted above, in embodiments, a technical tool for evaluatinginvestment opportunities to identify those which are likely to be moresuccessful than others is presented in this disclosure. As also noted,such a tool would allow an investor to evaluate startups early in theirevolution and to predict the success they are likely to achieve. Oneimportant factor in the success of a startup company is the team ofpeople working for the company. Thus, FIG. 17, described above,illustrates the generation of team and team by role metrics inaccordance with various embodiments.

Thus, system and methods are also disclosed that include characterizingindividual team members and combining the individual characterizationsinto an aggregate characterization of the team. In embodiments, thesystem and methods next described may use machine learning technologiesto evaluate multiple types of data about individuals and teams topredict the likelihood the company will be successful and therefore be agood investment. These individual and team characterizations can becombined with other measures of company performance relevant topredicting whether the company will succeed or fail, such as weredescribed above.

Thus, in embodiments, besides external data regarding a company and itsactivities, i.e., data that relates to or describes the activities of acompany and those of its individual members or employees, a system maygo significantly further, and infer certain highly relevant charactertraits of its individual members, as well as groups of such individualsworking as a team in various aspects of a company. In embodiments, asystem may infer the entrepreneurial character of an individual (teammember) and a group of individuals working as a team (team) usingdisperse and heterogeneous sources of data. In what follows, the“entrepreneurial character” will sometimes be referred to by the acronym“EC.”

An EC represents whether an individual entrepreneurial current characteris appropriate to start or continue leading a company and if thecharacter will evolve positively or negatively. In embodiments, data tobe used in an EC analysis may be extracted from web pages, structuredforms, or documents in different formats (e.g., pdf, doc, etc.). Thedata may be collected during an extended period of time at more or lessregular intervals.

In embodiments, an entrepreneurial character (EC) may be comprise afeature vector, where each individual feature can be either of typeBoolean (with an associated confidence level) or of type Numeric, avalue normalized within a certain scale. Each individual feature mayalso have associated with it a pair of values: (1) a number thatrepresents the level of mutability of the character, and (2) a sign.These values may respectively be used to know whether the valueassociated with an individual character feature is expected to change(mutability), and in which direction (positively or negatively) thatchange may likely occur. For each feature, the evolution of its value,mutability level, and sign may be stored over time.

In embodiments, the values that together compose an EC can be summarizedinto one single value that may then be used to rank individuals withrespect to each other. To compute the EC of a team, the individual EC ofeach team member may first be computed.

In an individual EC, in general, the entrepreneurial character improveswhen the value of each one of the features is maximized. However, in ateam EC, the character can become worse when the same features aremaximized for more than one team member. For example, considering thatin a given embodiment one of the EC features is “strong-willed”, itwould be disastrous for the team if two or more team members scoredhighly in it. Two alphas may actually destroy a team, rather thanachieve its goals. For other features, however, such as “expertise in acertain topic X”, where more than one team member has significantexpertise, that may be very positive. On the other hand, where multipleteam members have each reached significant mastery of a subject, and thesubject is different for each, the synergy thereby created may be muchmore valuable where the mastered subjects are different, implying thatdiversity is good. For example, in the early history of Apple Computer,Steve Jobs was a master marketer and entrepreneur. Steve Wozniak was atechnical genius, who could create the least number of parts andintegrated circuits on a board to achieve the most computing power. Thesynergy of these two men's talents led to the significant innovations ofthat company.

In embodiments, the right combination for each type of feature for agroup EC may be learned using data from past successful companies, byapplying machine learning technology to the data. The combination mightnot be linear.

A. Team Member Profiles

FIG. 22 is a simplified conceptual diagram of a system and method forapplying machine learning to objectively and systematically evaluate ateam of individuals. We refer to a “team” in the general sense of agroup of individuals who work together for a common purpose. Morespecifically, this disclosure uses a business, especially a start-upentity, as an example of a common purpose. This system infers anentrepreneurial character (EC) of an individual (team member), and agroup of individuals working as a team, using disperse and heterogeneoussources of data. The system can be fully automated.

In FIG. 22, an example of a system 2200 is shown. The system comprises aset of software components executable on one or more processors.Preferably, the processors may be provisioned locally or in a scalablenetwork of computers or processors, aka “the cloud.” The system 2200 maybe configured to input or receive data from data sources 2202. Forexample, data used can be extracted from web pages 2201, structuredforms 2203, or documents 2205 in different formats (pdf, doc, etc.).Other data formats may include CSV, ARFF, JSON, etc. Data sources 2202may be public, paid, or private domain. These examples of data sourcesare merely illustrative and not limiting. Details of collectingelectronic data, for example, over a network, are known. In some cases,automated processes may be used to search the web and collectpotentially useful data. The data preferably is collected over anextended period of time, at more or less regular intervals, although thefrequency and number of data collections is not critical.

A feature synthesizer component 2210 may be arranged to process thereceived data from data sources 2202. Such processing may includeidentifying data fields, field types, and corresponding data values,etc. The feature synthesizer may determine which fields of data toimport into a dataset, and which to ignore. The processing may includecleaning or “scrubbing” the data to remove errors or anomalies. In somecases, text analysis may be applied to text fields, for example,tokenizing, stop word processing, stemming, etc. to make the data moreusable.

More importantly, feature synthesizer 2210, although illustrated as asingle entity for simplicity, may actually comprise N individual featuresynthesizer components. Thus, in FIG. 22, many elements above DataSources 2202 are provided N times, in a structure designed to process inparallel 1 through N elements or attributes. Each individual featuresynthesizer may be arranged to provide data, derived from the input datasources 2202, and store it in a corresponding Dataset 2215 for use witha corresponding specialized model builder 2225. The system may beinitialized or configured for processing a given set of attributes ofinterest. One example, a set of 12 attributes (i.e., N=12 in thisexample) about team members that could be included in an entrepreneurialteam member's profile, directed to background and experience data, arelisted below.

In embodiments, a feature synthesizer for a given attribute may beconfigured to recognize, and extract from the input data, informationthat is indicative of the attribute of interest. It may then store theextracted data in the corresponding dataset 2215 through 2217. Asdiscussed below, the process may be repeated periodically over time. Toillustrate, a feature synthesizer directed to technology understanding,for example, might look for data on a person's education, technicaldegrees, patents, and work experience. It may collect what degrees wereearned at what schools, and when. It might even look for grade reportsor special awards or designations such as cum laude. It may evaluationtechnical publication in which the person was an author. All of thisdata may be collected into a dataset for the technology understandingattribute. As another example, a feature synthesizer for an attributeattention to detail may collect writings authored by the person oninterest, and determine a frequency of misspellings or grammaticalerrors in those writings. Or, inconsistencies within the same writingmay be an indicator of lack of attention to detail. Again, thecorresponding feature synthesizer component gleans data relevant to itstask from the input data sources and stores it in a dataset.

In embodiments, a dataset must also include an assessment or score forthe particular attribute or variable of interest, at least for some ofthe records. In some cases, this evaluation may be conductedprogrammatically. In other cases, records may be evaluation by an expertwith regard to the attribute of interest, and the evaluation resultsinput to the dataset in association with the records reviewed. Theevaluation may be expressed as a binary result (detail oriented or notdetail oriented; high level of technical understanding, or not). In someembodiments, these evaluations may take the form of an analog value, saybetween 0 and 1.

Referring again to FIG. 22, a plurality (N) of specialized model buildercomponents 225 through 2227 may be provided. Each model builder may bearranged to create a machine-usable model of the data in its dataset2215 through 2217 (corresponding to attributes of interest 1 through N).For each attribute of interest 1 through N, some data in thecorresponding dataset that includes an evaluation of the attribute ofinterest may be used to train the model. In this way, the model 2230through 2233 may then be used to evaluate additional data in thecorresponding dataset that does not include an explicit assessment ofthe attribute or variable of interest. The trained model can thenpredict that assessment and provide a score for the correspondingattribute. In FIG. 22, each score output from an individual modelprovides a corresponding character element 2245 through 2247. Thecharacter element scores 1-N together form an individual team membercharacter vector or profile. As noted, in embodiments, Boolean or analogvalues may be used. Analog values preferably are normalized inIndividual Character Normalizer 2251, and then the vector 2250 stored inmemory. Details of building machine learning models such as classifiersare known. For example, see U.S. Pat. No. 9,269,054 to Francisco J.Martin, et al., incorporated herein by this reference.

Example types of information (attributes) about team members that couldbe included in an entrepreneurial team member's profile may includebackground and experience data, as follows (as shown in FIG. 23):

-   -   Perseverance 2301—Have they finished a long-term endeavor? (e.g.        5-year degree plus Ph.D.)    -   Adaptability 2302—Have they lived in more than one city,        country, and for how long?    -   Competitiveness 2303—Have they won a significant prize or award?        Involved in other competitive activities like sports?    -   Creativity 2304—Have they invented something special?    -   Communicativeness 2305—Have they presented at many conferences?    -   Detail Orientation 2306—Misspellings in the resume, paragraphs        indented irregularly, etc.    -   Market Understanding 2307—How many years of experience do they        have?    -   Technology Understanding 2308—Do they holds a tech degree? Do        they have practical tech experience?    -   Other Experience 2309—Do they have business experience? Startup        experience?    -   Network 2310—Number of connections? MBA?    -   Customer Orientation 2311—Have they held a sales role?    -   Design Orientation 2312—Have they attended design school? Do        they have practical design experience?

In embodiments, other information that may be included in a profilemight address character attributes such as “nonconformist?”,“dissenter?”, or “maverick?”, or aggregate attributes such as “rebel”for the preceding distinct attributes. Suitable feature synthesizers canbe configured to collect the data for model building.

In some systems, data may be collected for a mature organization, asdistinguished from a startup. By “mature” is meant an entity that hasreached an “outcome” indicative of success or failure (conveniently, abinary variable), or a “final state” as described above. Preferably, asnoted above, such data may be collected from thousands of organizationsso that it is statistically meaningful. Further, detailed informationfor each such entity may include attribute data for each team member inthat entity, such as described herein, and as referred to above in FIG.17. That data may be processed, and the actual outcomes included inappropriate datasets. This information may be used to further train or“tune” the attribute models by taking into account the eventual outcomesof actual companies.

Referring again to FIG. 22, in embodiments, a feature vector 2250 may bestored in a memory, the feature vector comprising a series of elementsor fields numbered 1 to N for illustration, each element storing one ormore values for a corresponding attribute or feature for an individualteam member as discussed. In embodiments, it is not required that therebe a value for every element in the vector. In some cases there may beinsufficient data in the dataset for analysis. The number of features Nis not critical; generally a greater number of features analyzed willtend to generate more reliable results. Tens of features may besufficient, while hundreds may provide more accurate scoring, both forcomparing individual team members, and for comparing aggregate teamscores.

FIG. 22B illustrates an example of a feature vector in more detail. Thefeature vector 2250 may comprise a plurality of individual characterelements or features 2252, again numbered 1 through N. In someembodiments, each individual feature can be either of type Boolean(preferably with a confidence level associated), or of type Numeric.Feature field 2254 is expanded in FIG. 22B to show a Boolean field 2256,along with a corresponding confidence field 2258. In addition, feature2254 may include a mutability field 2260. Preferably, mutability field2260 comprises a pair of values: (1) a number that represents the levelof mutability of the character and (2) a sign. These values may berespectively used to indicate to what extent the value associated withan individual feature is expected to change over time (mutability) andin which direction (positively or negatively). The level of mutabilitymay conveniently be scaled, for example, to 0-1.

In other embodiments, mutability may be a single Boolean value(indicating mutable or not). For example, whether a person (team member)speaks English might take a Boolean value, or it may have a scaled valuefrom 0 (not at all) to 1 (fluent). Referring again to FIG. 22B, anotherfeature 2266 is shown as expanded to illustrate a numeric field type2268. In an embodiment, a numeric type of feature may have a valuenormalized within a certain scale. This feature (attribute) 2266 alsomay include a mutability value pair 2270 as just described.

Referring again to FIG. 22, the first character vector 2250 (shown atthe bottom of the set of vectors) may be stored in memory, associatedwith a first time t₁. Before doing so, a determination of mutability maybe made by a mutability component 2249, so that mutability data may beincluded in the character vector. At a later time t₂, additional datamay be collected from data sources 2202, and processed by featuresynthesizers 2210 for the same team member TM1. In embodiments, the newdata may be processed by the same models 2230 through 2233, and a newfeature vector formed, as before. The new feature vector may be added tomemory, labeled as 2260. Subsequently, addition input data can beacquired and processed as described, at times t₃, t₄, etc. These datacollections may be periodic or more random. Each of them is stored inmemory, illustrated as vectors 2260 and 2270. The number is notcritical.

In embodiments, the same process may be repeated for each team member,or any selected subset of a team. Thus, the feature synthesizer, as partof collecting raw data, will identify the other team members ofinterest, and collect data associated with each of them. Accordingly, adataset may include records for each team member of interest, orseparate datasets may be provisioned. Details of the data storage are amatter of design choice. In FIG. 22, a first set of vectors 150, 160 and170 at times t₁, t₂, t₃, respectively, are shown all for a first teammember TM1. A second set of vectors may also be collected at differenttimes for a second team member (TM2) (not labelled, but being the threemiddle vectors in the set of nine (3^(rd) and 4^(th), as well as 6^(th)and 7^(th), overlapping) overlapping vectors in FIG. 22. The collectiontimes t1, t2 etc. may or may not be the same for each team member.Finally, a third set of vectors illustrated at 2290, 2292, 2294 areshown for a third team member (TM3). All of this data may be input to amultiple character combiner component 2269 to develop a team score.

FIG. 23 is a simplified conceptual diagram illustrating examples ofpersonal characteristics or attributes of an individual team memberthat, in embodiments, may be used in evaluation of the team member andthe team. The attributes shown, as listed above, are merely illustrativeand not limiting. Each of these (and other) attributes can be the targetof a corresponding model, built from the datasets as described above orequivalent kinds of data. The example attributes listed are positive,that is, generally desirable attributes. Other data can be acquired andanalyzed to consider generally negative attributes, say felonyconvictions or bankruptcy filings. This data too can be quantified, andthe influence used to reduce overall team member scores. Both positiveand negative attributes, i.e., a combination, can be analyzed in asystem of the type disclosed.

B. Team Profiles

Referring to FIG. 24, in embodiments, individual team member profilesmay be combined by formal mathematical rules into an aggregate profilefor the team. FIG. 24 shows a simplified flow diagram illustrating acomputer-implemented process for generating a team character score(aggregate or team profile) 2410 based on combining character scoresacross a plurality of individual team members. In embodiments,individual profiles may be cast as elements of an abstract algebraicstructure (e.g. a group, ring, module, field, or vector space) in thosecases where the profile and rules for combining them have sufficientstructure. They could also be characterized and combined in a more adhoc fashion. In FIG. 24, Team Member Character Score #1 may be combinedwith Team Member Character Scores #2 through #M to form the TeamCharacter Score 2405. In embodiments, each individual team member score(i.e., #1, #2, . . . #M), may comprise a plurality of elements 1 throughN. The team score may comprise a feature vector, as described above.

Each vector may correspond to a vector such as those described withregard to FIG. 22 for a given team member. For example, the vectors foreach team member 1 through M shown in FIG. 24 may correspond to vectors2250, 2270 and 2290 in FIG. 22, each associated with a different teammember. For each vector, an aggregate score may be determined bycombining the individual attribute values in the corresponding vector.In embodiments, the aggregate scores may be determined by any suitableoperation such as arithmetic sum, mean value, etc. Operations may beapplied to combine numeric as well as Boolean values. These aggregatescores for each team member can be used to compare or rank team members.A reporting component (not shown) can generate these results to a userinterface or API.

An EC (character score) represents, and quantifies objectively, whetheror to what extent an individual is appropriate to start or continueleading a company and if the character is predicted to evolve positivelyor negatively. More specifically, the mutability metrics, shown as M inFIG. 22, and for the team as MT 2413 in FIG. 24, may be acquired andanalyzed over time in the vectors from T=0 to T=M. With these metrics,average values, rates of change, and other statistical measures can beused to assess and predict where each attribute is moving for those thatare mutable. Increasing values of a positive attribute may be contributeto a higher overall team member score, and to a higher team score 2405.

In embodiments, a combiner used to compute the overall EC of an a team“TCS” 2415 may be adapted to reflect the type of company the team willoperate. The same applies to other attributes about a company (market,stage, funding, etc), as described above. In embodiments, a combiner foran individual EC or team EC may be a combination of combiners.

Distribution of Character Components

Some character components or attributes are generally positive for everyindividual in which they are found, for example, hard working or welleducated, and they remain positive when these attributes are found fromthe input data to exist across multiple members of a team. In a sense,they may be considered additive contributions to the overall team score.In some cases, attributes such at assertiveness, strong leader,authoritarian may be positive for an individual, but may not be positivewhere found in multiple members on the same team. For this reason, oursystem may implement a paradigm or preferred distribution for eachcharacter component. For some attributes, a very small number ofinstances (team members) may be preferred. For other attributes, themore team members that exhibit the attribute, the better for overallteam function. To that end, we create a preferred distribution for eachcharacter component. Then the process assesses how closely thedistribution for a given attribute matches the preferred distribution.Mathematically, this can be done in various ways, for example, summingthe differences between the actual distribution and the preferreddistribution, or using a sum of squares, etc. In some embodiments,correlation coefficients may be used to assess this “closeness” ordeviation from the preferred distribution.

FIG. 24 also shows (bottom drawing) a simplified conceptual diagramillustrating a computer-implemented process for generating an overallteam score based on combining team member character components andassessing the combined team member character components based oncorresponding predetermined character component distribution data foreach component. For example, in FIG. 24, a first row of elements 2420may comprise the attribute values for an selected attribute #1 acrossall M team members, that is TM1 value #1 through TMM value #1. Inembodiments, these values may be combined by applying a selectedoperator (indicated by the asterisk), to form an overall score 2423 forthe team for that attribute #1. In the second row 2430, the elementvalues may be collected for a second attribute #2, again for all M teammembers. A second operator (indicated by the asterisk) may be applied tothis data to form the team result 24233. Similarly, additional operatorsmay be applied for each element or attribute, across all team members,finally determining the last team attribute TM#N 2444.

The various operators may be selected according to the specificattribute of interest. To illustrate, if the team is going to worktogether in the English language, it would be important for all membersof the team to speak English. Here, we will use English language skillfor attribute #1, and assume it is a Boolean variable. Thus we apply theBoolean NAND operator for the operator in row 2420 so that the teamresult at 2423 will be true (Boolean 1) only if all team members speakEnglish.

As another example, suppose the team is going to build a web applicationfor consumers to use. It would be important for at least one team memberto be skilled at user building user interfaces (UX). Here, we will useUX skill for attribute #2, and again assume it is a Boolean variable(the skill is present or it is absent in each team member, asascertained from the input data by a corresponding feature synthesizerand model. Assuming that one person skilled in UX is enough, we applythe Boolean OR as operator, to determine the team result 2433. If one ormore team members have that UX skill, it will result in the result 2433true.

Suppose that attribute #N is “strong leader and authoritarian.” It wouldbe helpful to have exactly one person on the team with that attribute.Again, for now, we assume it is a Boolean variable. For the operator inrow 2440 we apply the Boolean XOR operator across the team members. Ifthere is one team member with that attribute, the output at 2444 will betrue. In general, Boolean logic can be applied to realize any desiredcomposition of the team. Further, compound expressions can be used informing the team values for a given attribute. A compound expressionhere refers to a Boolean operation where at least one of the operands isitself a Boolean function of the team members data.

The results at 2423, 2433 and 2444, that is the Boolean output for theteam for each attribute, together form a team profile—a vector ofBoolean values. The number of “ones” can be counted to form a team score2450. This score will improve in proportion to the number of elements orattributes for which the team “fits” the preferred distribution. Thisscore can be used to compare teams or subsets of team quite readily.Different sets of attributes can be used by creating a desired orparadigm distribution and processing the data with correspondinglyselected operators. Comparison of the team's resulting profile to theparadigm distribution will immediately identify where the team missesthe mark. As explained above, some attributes are not simply input datafrom the input data sources. Rather, some attributes must be inferred,or estimated, by the feature synthesizer and model building processesdescribed above.

Several examples of Boolean attributes have been discussed above. Otherattributes, or some of the same attributes, may have numeric values, forexample, in a range of 0 to 1. For example, English language proficiencyor UX programming skills can be assessed on a numeric scale. A team canbe evaluated using these metrics as well. FIG. 25 is a simplifiedconceptual diagram illustrating analysis of a team profile relative to apreferred numeric distribution. Here, numeric values are scaled andquantized from 0 to 1. A team EC (profile) 2510 shows values (from 0to 1) for each attribute a, b, c etc. For example, if an attribute ofinterest is years of formal education, the average or median number ofyears of education across the team members can be scaled and indicatedin a vector. Other attributes like language skills can be used as wellas numeric data types. The team attribute values may be collected in avector 2520, where we illustrate the values graphically like ahistogram. A preferred or paradigm distribution for the same set ofattributes can be provided, shown as histogram 2530. The preferreddistribution may be generated by analysis of a large collection of data,for example, that reflects startup entities' teams and their success orfailure several years after they started. The team vector 2520 may becompared to the preferred distribution vector 2530. Here, we see thatattribute 522, for example, in the team vector 2520 has the same valueas the corresponding value 2532 in the preferred distribution 2530. Theattribute 2524 in the team vector has a lower value 2534 in thepreferred distribution 2530. The attribute with value 2526 has a highervalue 2536 in the preferred distribution, etc. These differences or“delta” are illustrated as a delta histogram 2540. The closeness or“fit” of the team vector 2520 to the preferred distribution 2530 can bequantified by the delta data. In an embodiment, an area of therectangles 2542, 2544, and 2546 can be calculated to determine a teamscore 2550.

In embodiments, the team score can be used for comparison to otherteams. Importantly, the delta data can quickly identify where the teamattributes depart from the preferred values. Further, the size of thosedepartures can be reported to help to build a better team.

In viewing and using these metrics, the mutability values discussedabove may be taken into consideration. Where a team score is relativelylow, but the attributes that contribute to lowering the score aremutable in a positive direction, the score may improve over time. On theother hand, where the mutability values are low or negative, improvementover time is less likely.

FIG. 26 is a simplified flow diagram that summarizes the processesdescribed above in one embodiment. Identify data sources and configuredata collection, block 2602. Refer to sources 2202 in FIG. 22, forexample. Upload raw data for a team member, block 2604. Process the datato synthesize feature data, block 2606. Use the feature data to populatea dataset, block 2610. The dataset may correspond to a dataset 2215 inFIG. 22. If prediction models were previously provisioned at 2628, applythe models to the dataset to generate a score for each attribute orcharacter element for the current team member, Block 2620. If suchmodels have not been provisioned, decision 2628, then provide the datato a specialized model builder for each attribute, block 2640 to thengenerate or update the models, block 2630, and then apply them, block2620.

Referring further to FIG. 26, apply a mutability analysis to addmutability metrics to the data, 2632. For some cases, the mutability maybe predetermined. For example, a date of birth or bachelor degree grantare immutable. A language skill may improve over time. In someembodiments mutability may be inferred by changes in the team memberdata over time, as data is collected repeatedly over time (see decision2642 and FIG. 22). Store the team member feature vector in memory, block2636. If there are more members not yet processed, decision 2642, loopback to 2604 to collect data for the next team member. After all teammembers are processed, i.e., a No at decision block 2642, proceed onpath 2650 to a decision 2651 as to whether to update the input data. Insome embodiments, the input data may be updated periodically oraccording to some other schedule. If an update is indicated, continue toblock 2604 to repeat the foregoing steps and acquire new data for eachteam member.

Otherwise, proceed to block 2644 to combine team member feature vectorsto form a team (aggregate) feature vector. Next, compare the team vectorto a preferred distribution or composition, block 2646. The differencesbetween the team vector and the preferred composition may be assessed,block 2650, which may include generating an overall team score for readycomparison to other teams. Finally, results reporting at block 2652 mayinclude final team score, problematic attributes, mutability assessment,and other metrics which can be used to predict success of the team, andto improve its composition. Process flow terminates at block 2660.

Referring now to FIG. 27, wherein a block diagram of a computer devicesuitable for practicing the present disclosure, in accordance withvarious embodiments, is illustrated. As shown, computer device 2700 mayinclude one or more processors 2702, memory controller 2703, and systemmemory 2704. Each processor 2702 may include one or more processor coresand/or hardware accelerator 2705. An example of hardware accelerator2705 may include, but is not limited to, programmed field programmablegate arrays (FPGA). Memory controller 2703 may be any one of a number ofmemory controllers known in the art. System memory 2704 may include anyknown volatile or non-volatile memory.

Additionally, computer device 2700 may include mass storage device(s)2706 (such as solid state drives), input/output device interface 2708(to interface with various input/output devices, such as, mouse, cursorcontrol, display device (including touch sensitive screen), and soforth) and communication interfaces 2710 (such as network interfacecards, modems and so forth). In embodiments, communication interfaces2710 may support wired or wireless communication, including near fieldcommunication. The elements may be coupled to each other via system bus2712, which may represent one or more buses. In the case of multiplebuses, they may be bridged by one or more bus bridges (not shown).

Each of these elements may perform its conventional functions known inthe art. In particular, system memory 2704 and mass storage device(s)2706 may be employed to store a working copy and a permanent copy of theexecutable code of the programming instructions of an operating system,and one or more applications, collectively referred to as computinglogic 2722. The programming instructions may comprise assemblerinstructions supported by processor(s) 2702 or high-level languages,such as, for example, C, that can be compiled into such instructions. Inembodiments, some of computing logic may be implemented in hardwareaccelerator 2705.

The permanent copy of the executable code of the programminginstructions or the bit streams for configuring hardware accelerator2705 may be placed into permanent mass storage device(s) 2706 in thefactory, or in the field, through, for example, a distribution medium(not shown), such as a compact disc (CD), or through communicationinterface 2710 (from a distribution server (not shown)).

The number, capability and/or capacity of these elements 2710-2712 mayvary, depending on the intended use of example computer device 2700,e.g., whether example computer device 2700 is a smartphone, tablet,ultrabook, a laptop, a server, a set-top box, a game console, a camera,and so forth. The constitutions of these elements 2710-2712 areotherwise known, and accordingly will not be further described.

FIG. 28 illustrates an example computer-readable storage medium havinginstructions configured to implement all (or portion of) the system ofFIG. 3, or the software environment shown in FIGS. 4-21, and/or practice(aspects of) the processes of, or referred to in, FIGS. 3-25,respectively, earlier described, in accordance with various embodiments.As illustrated, computer-readable storage medium 2802 may include theexecutable code of a number of programming instructions or bit streams2804. Executable code of programming instructions (or bit streams) 2804may be configured to enable a device, e.g., computer device 2700, inresponse to execution of the executable code/programming instructions(or operation of an encoded hardware accelerator 2705), to perform(aspects of) the processes of, illustrated within, or referred to inFIGS. 3-25, respectively. In alternate embodiments, executablecode/programming instructions/bit streams 2804 may be disposed onmultiple non-transitory computer-readable storage medium 2802 instead.In embodiments, computer-readable storage medium 2802 may benon-transitory. In still other embodiments, executable code/programminginstructions 2804 may be encoded in transitory computer readable medium,such as signals.

Referring back to FIG. 27, for one embodiment, at least one ofprocessors 2702 may be packaged together with a computer-readablestorage medium having some or all of computing logic 2722 (in lieu ofstoring in system memory 2704 and/or mass storage device 2706)configured to practice all or selected ones of the operations earlierdescribed with references to FIGS. 3-25. For one embodiment, at leastone of processors 2702 may be packaged together with a computer-readablestorage medium having some or all of computing logic 2722 to form aSystem in Package (SiP). For one embodiment, at least one of processors2702 may be integrated on the same die with a computer-readable storagemedium having some or all of computing logic 2722. For one embodiment,at least one of processors 2702 may be packaged together with acomputer-readable storage medium having some or all of computing logic2722 to form a System on Chip (SoC). For at least one embodiment, theSoC may be utilized in, e.g., but not limited to, a hybrid computingtablet/laptop.

One of skill in the art will recognize that the concepts taught hereincan be tailored to a particular application in many other ways. Inparticular, those skilled in the art will recognize that the illustratedexamples are but one of many alternative implementations that willbecome apparent upon reading this disclosure. It will be obvious tothose having skill in the art that many changes may be made to thedetails of the above-described embodiments without departing from theunderlying principles of the invention. The scope of the presentinvention should, therefore, be determined only by the following claims.

Where the disclosure recites “a” or “a first” element or the equivalentthereof, such disclosure includes one or more such elements, neitherrequiring nor excluding two or more such elements. Further, ordinalindicators (e.g., first, second or third) for identified elements areused to distinguish between the elements, and do not indicate or imply arequired or limited number of such elements, nor do they indicate aparticular position or order of such elements unless otherwisespecifically stated.

What is claimed:
 1. A method of adaptive and reusable processing ofretroactive sequences for automated predictions, comprising: receivingexplicit data reporting or indicating activities of a plurality ofbusiness entities; receiving other data from which information regardingthe activities or level of operations of the business entities may beinferred; using one or more data processors: generating inferred dataregarding the business entities from a selected portion of the otherdata; processing the explicit data and the inferred data to generate adatabase of at least some of the business entities, entries in thedatabase comprising at least one of partial timelines or sets ofsequences of events of the business entities; processing the database toinfer lifecycles for the business entities, a lifecycle comprising afinal state, and a sequence of active states a business entity passesthrough prior to the final state; outputting the inferred lifecycles toa user.
 2. The method of claim 1, wherein the final state is one ofclosed, acquired, initial public offering and acquired to obtainemployees.
 3. The method of claim 1, wherein the sequences of activestates include one or more of concept, seed, product development, marketdevelopment and steady status.
 4. The method of claim 3, wherein thesequences of active states conform to a defined series of hidden activestates that a business entity may go through, and wherein in the definedseries of active states, multiple actual events of the companies aresubsumed by at least one of the active states.
 5. The method of claim 1,wherein the final state of the business entities is observed.
 6. Themethod of claim 1, wherein processing the database to infer lifecyclesincludes determining, for each of the sequence of active states in thelifecycle, expected values for a set of scores indicative of that activestate.
 7. The method of claim 6, further comprising determining a vectorthat expresses a probability of transitioning from that active state tothe final state of the lifecycle.
 8. The method of claim 1, whereinprocessing the database to infer lifecycles includes associating aComplex Hidden Markov Model (CHMM) with each company.
 9. The method ofclaim 1, wherein processing the database to infer lifecycles includesreconstructing sequences of potential active states in multiple passes,the passes separated in time from each other, and using differentsources of information.
 10. The method of claim 1, wherein lifecyclesare generated for groups of companies and for individual companies. 11.A memory having instructions stored thereon that, in response toexecution by a processor, cause the processor to perform operationscomprising: uploading digital input data for a person associated with abusiness entity from at least one data source; provisioning a pluralityof feature synthesizers to synthesize feature data from the input datafor each attribute of a set of attributes to form at least one datasetof character feature data associated with the person; provisioning aplurality of specialized model builder components, each specializedmodel builder component configured to build a prediction model for acorresponding one of the attributes; training at least some of thespecialized model builder components based at least in part on thesynthesized feature data to form respective prediction models for theattributes, respectively; separately applying each of the trainedprediction models to at least a portion of the character feature dataassociated with the person to generate individual vectors correspondingto attributes, respectively, wherein the individual vectors form afeature vector; identifying a plurality of additional feature vectors,each of which includes individual vectors corresponding to theattributes, respectively, for a different person associated with thebusiness entity; combine the individual vectors of the feature vectorwith the individual feature vectors of the additional feature vectors,respectively, to form an aggregate score vector; and reportinginformation about the aggregate score vector via a display interface.12. The memory of claim 11, wherein reporting the information about theaggregate score vector via the display interface includes generating anddisplaying a visualization including graphical elements for individualvectors of the aggregate score vector, respectively, wherein areas ofthe graphical elements are sized to represent data taken from thecorresponding individual vector of the aggregate score vector.
 13. Thememory of claim 11, wherein the operations further comprise: repeatingthe uploading and synthesizing steps to acquire new data; separatelyapplying each of the trained prediction models to the new data to form anew individual vector for each one of the attributes for the person; andcomparing the new individual vectors to the individual vectors of thefeature vector to determine mutability of at least one of theattributes.
 14. The memory of claim 11, wherein the operations furthercomprise: comparing the aggregate score vector to a predeterminedpreferred distribution; and reporting a result of the comparison via thedisplay interface.
 15. The memory of claim 11, wherein the operationsfurther comprise: identifying attributes of the aggregate score vectorthat deviate from a predetermined preferred distribution; and reportingwhich attributes of the team aggregate score vector deviate from thepredetermined preferred distribution.
 16. The memory of claim 11,wherein the operations further comprise: in forming the business entityscore, separately for each attribute, combining the correspondingindividual scores based on a predetermined Boolean operator selected toreflect a desired composition of the business entity with regard to thecorresponding attribute.
 17. The memory of claim 16, wherein a BooleanAND operator is applied to combine the corresponding individual scoresfor an attribute that is required to be true according to apredetermined preferred distribution.
 18. The memory of claim 16,wherein a Boolean XOR operator is applied to combine the correspondingindividual scores for an attribute that is required to be true accordingto a predetermined preferred distribution.
 19. The memory of claim 11,wherein the operations further comprise: assessing a mutability metricfor at least one of the attribute for a team member, based on changes inthe corresponding attribute score over time, wherein the mutabilitymetric comprises a numeric value and a sign indicating a direction ofchange of the corresponding scores.
 20. The memory of claim 19, whereinthe operations further comprise including the mutability metric in thecorresponding attribute score in the first feature vector.